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Abstract: This paper describes a new approach to time series modeling that 
combines subject-matter knowledge of the system dynamics with statistical 
techniques in time series analysis and regression. Applications to American 
option pricing and the Canadian lynx data are given to illustrate this approach. 



1. Introduction 

In their Fisher Lectures at the Joint Statistical Meetings, Cox [llj and Lehmann 
[sH mentioned two major types of stochastic models in statistical analysis, namely, 
empirical and substantive (or mechanistic). Whereas substantive models are ex- 
planatory and related to subject-matter theory on the mechanisms generating the 
observed data, empirical models are interpolatory and aim to represent the observed 
data as a realization of a statistical model chosen largely for its flexibility, tractabil- 
ity and interpretability but not on the basis of subject-matter knowledge. Cox ^llj 
also mentioned a third type of stochastic models, called indirect models, that are 
used to evaluate statistical procedures or to suggest methods for analyzing com- 
plex data (such as hidden Markov models in image analysis). He noted, however, 
that the distinctions between the different types of models are important mostly 
when formulating and checking them but that these types are not rigidly defined, 
since "quite often parts of the model, e.g., those representing systematic variation, 
are based on substantive considerations with other parts more empirical." In this 
paper, we elaborate further the complementary roles of empirical and substantive 
models in time series analysis and describe a basis function approach to combining 
subject-matter (domain) knowledge with statistical mode ling techniques. 

This basis function approach was first developed in |29j for the valuation of 
American options. In Sections 2 and 3 we review the statistical and subject-matter 
models for option pricing in the literature as examples of empirical and substantive 
models in time series analysis. Section 4 describes a combined substantive-empirical 
approach via basis functions, in which the substantive component is associated with 
basis functions of a certain form, and the empirical component uses flexible and 
computationally convenient basis functions such as regression splines. The work 
of Lai and Wong (29| on option pricing and recent related work in financial time 
series are reviewed to illustrate this approach. Section 5 applies this approach to a 
widely studied data set in the nonlinear time series literature, namely, the Canadian 
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lynx data set that records the annual numbers of Canadian lynx trapped in the 
Mackenzie River district from 1821 to 1934. We use substantive models from the 
ecology literature together with multivariate adaptive regression splines to come up 
with a new time series model for these data. Some concluding remarks are given in 
Section 6. 



2. Statistical (empirical) time series models 

The development of statistical time series models in the past fifty years has wit- 
nessed a remarkable confluence of basic ideas from various areas in statistics and 
probability, coupled with the powerful influence from diverse fields of applications 
ranging from economics and finance to signal processing and control systems. The 
first phase of this development was concerned with stationary time series, leading to 
MA (moving average), AR (autoregressive) and ARMA representations in the time 
domain and transfer function representations in the frequency domain. This was 
followed by extensions to nonstationary time series, either by fitting (not necessarily 
stationary) ARMA models or by the Box- Jenkins approach involving the ARIMA 
(autoregressive integrated moving average) models and their seasonal SARIMA 
counterparts. More general fractional differencing then led to the ARFIMA mod- 
els. The next phase of the development was concerned with nonlinear time series 
models, beginning with bilinear models that add cross-product terms yt^iet-j to 
the usual ARMA model yt = /3ij/t-i + • ■ ■ + Ppyt-p + et + ciCf-i + • • • -I- Cq€t-q, and 
threshold autoregressive and regime switching models that introduce nonlineari- 
ties into the usual autoregressive models via state-dependent changes or Markov 



jumps in the autoregressive parameters. The monograph by Tong 4j| summarized 
these and other nonlinear time series models in the previous literature. The appro- 
priateness of the parametric forms assumed in these nonlinear time series models, 
however, may be difficult to justify in real applications, as pointed out by Chen and 
Tsay ^. 

Whereas the AR model yt = (3iyt-i+- ■ ■+/3pyt-p+et is related to hnear regression 
since /3^xt is the regression function E{yt\yit) of yt given (yt-i, ■ ■ ■ ,yt-pY ■, 

and likewise its nonlinear parametric extensions yt = f{x.t,(3) + et are related to 
nonlinear regression, Chen and Tsay [1, [3 

proposed to use nonparametric re- 
gression for E{yt\'x.t) instead. They started with functional-coefficient autoregres- 
sive (FAR) models of the form yt — /i(x( )j/t_i + • • • + fp[iit)yt~p + €t, where 
/i , . . . , /p are unspecified functions to be estimated by local linear regression and 
X( — {yt-ii , ■ ■ ■ , Vt-ij)'^ with ii < ■ ■ ■ < id chosen from {1, . . . Because of sparse 
data in high dimensions, local linear regression typically require to be 1 or 2. To 
deal with nonparametric regression in higher dimensions, they considered additive 
autoregressive models of the form yt = fi(yt-i^) + • ■ ■ + fd{yt~id) + ^t, in which the 
fi can be estimated nonparametrically via the generalized additive model (GAM) 
of Hastie and Tibshirani [l^ . Making use of Friedman's [l^l multivariate adap- 



tive splines (MARS), Lewis and Stevens [3J] and Lewis and Ray [3^, |33| developed 



spline models for empirical modeling of time series data. Weigend, Rummelhart 
and Huberman [4§| and Weigend and Gershenfeld [13 proposed to use neural net- 
works (NN) to model E{yt\xt), while Lai and Wong [28j considered a variant called 
stochastic neural networks, for which they could use the EM algorithm to develop 
efficient estimation procedures that have much lower computational complexity 
than those for conventional neural networks. 

The preceding time series models are autonomous, relating the dynamics of yt to 
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the past states. In econometrics and engineering, the outputs yt are related not only 
to the past outputs but also to the past inputs Ut-d, ■ ■ ■ ,ut-k- Therefore the AR 
model has been extended to the ARX model (where X stands for exogenous inputs) 
yt = f3'^xt + et with Xt = (yt-i, • ■ • , yt-p, Ut-d, • ■ • , Ut-kV ■ Instead of assuming a 
linear or nonlinear parametric regression model, one can use nonparametric regres- 
sion to estimate £'(j/t|xt), as in the following financial application. 

Example 1. As noted by Ross [40], option pricing theory is "the most successful 
theory not only in finance, but in all of economics." A call (put) option gives the 
holder the right to buy (sell) the underlying asset (e.g. stock) by a certain date 
T (known as the "expiration date" or "maturity") at a certain price (known as 
the "strike price" and denoted by K). European options can be exercised only on 
the expiration date, whereas American options can be exercised at any time up to 
the expiration date. The celebrated Black-Scholes theory, which will be reviewed in 
Section 3, yields the following pricing formulas for the prices ct and pt of European 
call and put options at time t e [0, T): 

(2.1) Q = 5te-'^(^~*)$(di(5t, if, T^t))- Ke-^(^-'^<S>id2iSuK, T - t)), 

(2.2) Pt = ife-'-(^"*)$(-d2(5t, if, T-t))- 5te-'^(^-*)$(-di(5t, if, T - t)), 

where $ is the cumulative distribution function of the standard normal random 
variable, St is the price of the underlying asset at time d is the dividend rate of the 
underlying asset, (ii(a;, y, u) — {\og{x/y) + (r — d + a"^ /2)v}/ay/v and d2{x,y,v) — 
di{x,y,v) — a\fv. Hutchinson, Lo and Poggio ^22] pointed out that the success of 
the formulas (|2.ip and (|2.2p depends heavily on the specification of the dynamics 
of St . Instead of using any particular model of St , they proposed a data-driven way 
for pricing and hedging with a minimal assumption: independent increments of the 
underlying asset price. Noting that yt (= Ct or pt) is function of St/ K and T ~ t 
with r and cr being constant, they assume yt = K f(St/ K,T — t) and approximate 
/ by taking xt = {St/K, T — t)^ in the following models: 

(i) radial basis function (RBF) networks /(x) = /3o + oc^x + Pihi{\\A{x — 
7i)||), where ^ is a positive definite matrix and hi is of the RBF type e~"^^'^i 
or {u^ + afy/^; 

(ii) neural networks /(x) = ■(/;(/?o-l-X]f=i A^(7i+Q:fx)), where ft.(u) = 1/(1+6""") 
is the logistic function and ^ is either the identity function or the logistic 
function; 

(iii) projection pursuit regression (PPR) networks /(x) — fjo + J2i=i Pihiiafx), 
where hi is an unspecified function that is estimated from the data by PPR. 

The Cki, [3i and 7j above are unknown parameters of the network that are to be 



estimated from the data. As pointed out in [2^, all three classes of networks have 
some form of "universal approximation property" which means their approximation 
bounds do not depend on the dimensionality of the predictor variable x] see It 
should be noted that the above transformation of St to St/K can be motivated not 
only from the assumption on St but also from the special feature of options data. 
Although the strike price K could be any positive number theoretically, the options 
exchange only sets strike prices at a multiple of a fundamental unit. For example, 
Chicago Board Options Exchange (CBOE) sets strike prices at multiples of $5 for 
stock prices in the $25 to $200 range. Also, only those options with strike prices 
closet to the current stock price are traded and thus their prices are observed. Since 
St is non-stationary in general, the observed K is also non-stationary. Such features 



196 



T. L. Lai and S. P.-S. Wong 



create sparsity of data in the space of {St, K,T — t). Training the options pricing 
formula in the form of f{St, K,T — t) can only interpolate the data and can hardly 
produce any good prediction because {St,K) in the future can be very different 
from the data used in estimating /. The proposed transformation makes use of 
the fact that all observed and future St/K are close to 1. Therefore, the proposed 
transformation captures the stationary structure of the data and enable the non- 
parametric models to predict well. Another point that Hutchinson, Lo and Poggio 
[22] highlighted is the measure of performance of the estimated pricing formula. 
According to their simulation study, even a linear f{St/K, T—t) can give E? k, 90% 
(Table I of Hutchinson, Lo and Poggio [23|). However, such a linear / implies a 
constant delta hedging scheme which would provide poor hedging results. Since the 
primary function of options is hedging the risk created by changes in the price of the 
underlying asset, Hutchinson, Lo and Poggio [23| suggested using, instead of i?^, the 
hedging error measures ^ = e^'''^ E[\V {T)\] and = e-'''^[EV'^{T)Y/'^ , where V{T) 
is the value of the hedged portfolio at time T. In a perfect Black-Scholes world, 
V{T) should be if Black-Scholes formula is used. However, from the simulation 
study, the Black-Scholes formulas still give ^ > and 77 > because time is discrete. 
Hutchinson, Lo and Poggio reported that RBF, NN and PPR all give hedging 
measures comparable to those of the Black-Scholes in the simulation study. For 
real data analysis of futures options, RBF, NN and PPR performed better than the 
Black-Scholes formula in terms of hedging. 

For American options, instead of using these learning networks to approximate 
the option price, Broadie et al. P| used kernel smoothers to estimate the option 
pricing formula of an American option. Using a training sample of daily closing 
prices of American calls on the S&PlOO Index that were traded on the Chicago 
Board Options Exchange from 3 January 1984 to 30 March 1990, they compared the 
nonparametric estimates of American call option prices at a set of {S/K, t*) values 
with corresponding parametric estimates obtained by using the approximations to 
American option prices due to Broadie and Detemple [4|, and found significant 
differences between the parametric and nonparametric estimates. 

3. Substantive (mechanistic) models 

In control engineering, the dynamics of linear input-output systems are often given 
by ordinary differential equations, whose discrete-time approximations in the pres- 
ence of noise have led to the ARX models (for white noise) , and ARMAX models 
(for colored noise) in the preceding section. The problem of choosing the inputs 
sequentially so that the outputs are as close as possible to some target values when 
the model parameters are unknown and have to be estimated on-line has a large 
literature under the rubric of stochastic adaptive control; see Goodwin, Ramadge 
and Gaines [l3| , Lai and Wei [27| , Lai and Ying and Guo and Ghen . More 
general dynamics in the presence of additive noise have led to stochastic differen- 
tial equations (SDEs), whose discrete-time approximations are related to nonlinear 
time series models described in the preceding section. One such SDE is geometric 
Brownian motion (GBM) for the asset price process in the Black-Scholes option 
pricing theory. In view of Ito's formula, the GBM dynamics for the asset price St 
translate into SDE dynamics for the option price f(t,St). Such implied dynamics 
from the mechanistic model can be combined with subject-matter theory to derive 
the functional form or differential equation for / and other important corollaries of 
the theory, as illustrated in the following. 
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Example 2. In the Black-Scholes model, the asset price St is assumed to be GBM 
defined by the SDE 

(3.1) dSt/ St = fJ^dt + adwt, 

where Wt,t > 0, is Brownian motion. Letting /(t, S) be the price of the option at 
time t when St = S, it follows from (|3.ip and Ito's formula that 

dfit, St) - f + %dSt + i <j^sfdt 

'% + i^St^ + ¥'s^Uyt + <ySt^dwt. 

For simplicity assume that the asset does not pay dividends, i.e., d — Q. Consider 
an option writer's portfolio at time t, consisting of —1 option and yt units of the 
asset. The value of the portfolio ttj is ~f{t, St) + ytSt and therefore 

Hence setting — df /dS yields a risk- free portfolio. This is the basis of delta 
hedging in the options theory of Black and Scholes jsj, who denote df/dS by A. 
Besides GBM dynamics for the asset price, the Black-Scholes theory also assumes 
that there are no transaction costs and no limits on short selling and that trading 
can take place continuously so that delta hedging is feasible. Since economic theory 
prescribes absence of arbitrage opportunities in equilibrium, nt that consists of — 1 
option and A units of the asset should have the same return as mtdt = r{^f + 
Stl^)dt, yielding the Black-Scholes PDE for /: 

with the boundary condition f{T,S) — g{S), where g{S) = {K — 5*)+ for a put 
option, and g{S) = (S — K)^ for a call option, where x+ = max(a;,0). This PDE 
has the explicit solution (|2.ip or (|2.2p with d = 0. If the asset pays dividend at rate 
d, then a modification of the preceding argument yields (|3.2p in which rS{df/dS) 
is replaced by (r - d)S{df/dS). 

Merton [37j extended the Black-Scholes theory for pricing European options to 
American options that can be exercised at any time prior to the expiration date. 
Optimal exercise of the option is shown to occur when the asset price exceeds (or 
falls below) an exercise boundary dC for a call (or put) option. The Black-Scholes 
PDE still holds in the continuation region C of {t,St) before exercise, and dC is 
determined by the free boundary condition df/dS = 1 (or —1) for a call (or put) 
option. Unlike the explicit formula (|2.ip or (|2.2p for European options, there is 
no closed-form solution of the free-boundary PDE and numerical methods such as 
finite differences are needed to compute American option prices under this theory. 

By the Feynman-Kac formula, the PDE p.2p has a probabilistic representation 
f{t,S) = E[e~^^'^~*'' g{ST)\St = S], and the expectation E is with respect to the 
"equivalent martingale measure" under which dSt/St = (r — d)dt + advot- This 
representation generalizes to American options as the value function of the optimal 
stopping problem 



(3.3) 



f{t,S)^ sup E[e-''''^--'^g{Sr)\St = S] 
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where Tt^T denotes the set of stopping times r taking values between t and T. 
Cox, Ross and Rubinstein [l2[ proposed to approximate GBM by a binomial tree, 
with root node So at time 0, so that (j3.3|) can be approximated by a discrete- 
time and discrete-state optimal stopping problem that can be solved by backward 
induction. Denote f{t,S) by C{t,S) for an American call option, and by P{t,S) 
for an American put option. Jacka [2^ and Carr, Jarrow and Myneni Q derived 
the decomposition formula 

P{t, S) = p{t, S) + KpeP'' f {e-P'<^(''^^^ 

(3.4) 



\/s- u 

gg-(ep.+u/2)+.<j,/£(£l_i _ ^j^A 1 



and a similar formula relating C{t,S) to c(t,S), where z{u) is the early exercise 
boundary dC under the transformation 

(3.5) p = r/a^, = d/r; u = a'^{t -T), z = logiS/K) -{p-Op- l/2)u. 

Ju (i^ l found that the early exercise premium can be computed in closed form if 
dC is a piecewise exponential function which corresponds to a piecewise linear z{u). 



By using such assumption, Ju 2J| reported numerical studies showing his method 
with 3 equally spaced pieces substantially improves previous approximations to 
option prices in both accuracy and speed. AitSahlia and Lai [l| introduced the 
transformation (|3.5p to reduce GBM to Brownian motion and showed that z{u) 
is indeed well approximated by a piecewise linear function with a few pieces. The 
integral obtained by differentiating that in p.4p with respect to S also has a closed- 
form expression when z(-) is piecewise linear, and approximating z(-) by a linear 
spline that uses a few unevenly spaced knots gives a fast and reasonably accurate 
method for computing A = dP/dS. 

The Black-Scholes price involves the parameters r and tr, which need to be 
estimated. The yield of a short-maturity Treasury bill is usually used for r. Although 
in the GBM model for asset prices which are observed at fixed intervals of time 
(e.g. daily), one can estimate a by the standard deviation of historical (daily) 
asset returns, which are i.i.d. normal under the GBM model for asset prices, there 
are issues due to departures from this model (e.g., a can change over time and 
asset returns are markedly non-normal) and due to violations of the Black-Scholes 
assumptions in the financial market (e.g., there are actually transaction costs and 
limits on short selhng). Section 13.4 and Chapter 16 of Hull [2l| discuss how the 
parameter a in the Black-Scholes option price is treated in current practice. In the 
next section we describe an alternative approach that addresses the discrepancy 
between the Black-Scholes-Merton theory and time series data on American options 
and the underlying stock prices. 



4. A combined substantive-empirical approach 

In this section we describe an approach to time series modeling that contains both 
substantiative and empirical components. We first came up with this approach when 
we studied valuation of American options. Its basic idea is to use empirical modeling 
to address the gap between the actual prices in the American options market and the 
option prices given by the Black-Scholes-Merton theory in Example 2, as explained 
below. 
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Example 3. For European options, instead of using the basis function of Hutchin- 
son, Lo and Poggio [22], an alternative approach is to express the option price as 
c + Ke^'^'*' f*{S/K,t*), where c is the Black-Scholes price p.ip because the Black- 
Scholes formula has proved to be quite successful in explaining empirical data. This 
is tantamount to including c(i, S) as one of the basis functions (with prescribed 
weight 1) to come up with a more parsimonious approximation to the actual option 
price. 

The usefulness of this idea is even more apparent in the case of American options. 
Focusing on puts for definiteness, the decomposition formula p.4p expresses an 
American put option price as the sum of a European put price p and the early 
exercise premium which is typically small relative to p. This suggests that p should 
be included as one of the basis functions (with prescribed weight 1). Lai and Wong 
[2^ propose to use additive regression splines after the change of variables u — 
—a'^{T—t) and z — \og{S/K). Specifically, for small T — t (say within 5 trading days 
prior to expiration, i.e. T — t < 5/253 under the assumption of 253 trading days per 
year), we approximate P by p. For T — t > 5/253 (or equivalently, u < — 5cr^/253), 
we approximate P by 



P = p + Ke''^{a + aiu + aij^j{u — u"' 



(4.1) + PiZ + I32Z^ + g (32+j{z - z(^^^)l + JiW + J2W^ 

J™ 

where p = r/a"^ as in p.5p . a, aj, f3j and 7^ are regression parameters to be 
estimated by least squares from the training sample and 

(4.2) w = \u\-^/'^{z-{p-ep-l/'2)u} {9 = d/r) 

is an "interaction" variable derived from z and u. The motivation behind the cen- 
tering term {p — 9p — l/2)u comes from (I3.5P that transforms GBM into Brown- 
ian motion, whereas that behind the normalization |it|^^/^ comes from p.4p and 
the closely related di{x,y,v) in (12. 2|) . The knots m^-'-' (respectively z*^^^ or w^^'') 
of the linear (respectively quadratic) spline in (|4.ip are the lOOj/Ju (respectively 
lOOj/Jz and lOOj'/ Ju,)-th percentiles of {ui, . . . , u„} (respectively {zi, . . . , z„} or 
{wi, . . . , Wn})- The choice of Ju, Jz and is over all possible integers between 1 
and 10 to minimize the generalized cross validation (GCV) criterion, which can be 
expressed in the following form (cf. [iqI. l46l|): 



GCV(Ju, Jz, Jw) 




Ju + Jz+ Jw+Q 



where the Pi are the observed American option prices in the past n periods, and 
the Pi are the corresponding fitted values given by (|4.ip in which the regression 
coefficients are estimated by least squares. 

In the preceding we have assumed prescribed constants 7 and a as in the Black- 
Scholes model; these parameters appear in (|4.ip via the change of variables (|3.5p . 
In practice a is unknown and may also vary with time. We can replace it in (|4.ip 



200 



T. L. Lai and S. P.-S. Wong 



by the standard deviation at of the most recent asset returns say, during the past 
60 trading days prior to t as in [^^j, P- 881. Moreover, the risk-free rate r may also 
change with time, and can be replaced by the yield fi of a short-maturity Treasury 
bill on the close of the month before t. The same remark also applies to the dividend 
rate. 

The simulation study in Lai and Wong shows the advantages of this combined 
substantive-empirical approach. Not only is P well approximated by P, especially 
over intervals of S/K values that occur frequently in the sample, A — A also reveals 
a pattern similar to that of P — P. Besides — E{e~^'^\Vp{T)\}, where r is the 
time of exercise and Vp{t) is the value of the replicating portfolio at time t that 
rebalances (according to the pricing formula P) between the risky and riskless assets 
([121, P- 868-869), Lai and Wong '2^ also consider the measure 

(4.3) ^p^E y\st/KfiAit) - Ait))'dt^ , 

where A = dP/dS. In practice, continuous rebalancing is not possible. If rebalanc- 
ing is done only daily, then (5/ K)'^{Aa — A)^ in (|4.3|) is replaced by a step function 
that stays constant on intervals of width 1/253. Because of the adaptive nature of 
the methodology, the proposed approach of Lai and Wong (29j is much more ro- 
bust to the misspecification error than the Black-Scholes formula in terms of both 
measures. Lai and Lim [26] carried out an empirical study of this approach and 
made use of its semiparametric pricing formula and (|4.3p to come up with a modi- 
fied Black-Scholes theory and optimal delta hedging in the presence of transaction 
costs. 



5. Application to the 1821-1934 Canadian lynx data 

The Canadian Lynx data set consists of the annual record of the numbers of the 
Canadian lynx trapped in the Mackenzie River district of the North-west Canada 
for the period 1821-1934 inclusively. Let Xt be logjQ(number recorded as trapped in 
year 1820 -I- 1) {t = 1, . . . , 114). Figure 1 shows the time series plot of Xt- According 
to Tong [31, Moran [s^ performed the first time series analysis on these data by 
fitting an AR(2) model to Xt; moreover, the log transformation is used because it 
(i) makes the marginal distribution of Xt more symmetric about its mean and (ii) 
reduces the approximation error in assuming the number of lynx to be proportional 
to the population. In view of the substantial non-linearity of i?[At|At_3] found in 
the scatterplot of A* versus Xts, Tong([44[, p. 361) critiques Moran's analysis and 
its enhancements by Campbell and Walker [61] , who added a harmonic component to 
the AR(2) model, and by Tong [i^, who used the AIC to select the order p = 11 for 
AR(p) models, as "uncritical acceptance of linearity" in Af . He uses a self-excited 
threshold autoregressive model (SETAR) of the form 

f5 1) A -A ^ [0-62 + 0.25At_i-0.43At_2+et ifAt_2<3.25 
* |-(1.24Ai_2 - 2.25) + 0.52At_i+et ifAt_2>3.25 

to fit these data, similar to Tong and Lim ([i^. Section 9). The growth rate Xt — 
Xt-i in the first regime (i.e., At_2 < 3.25) tends to be positive but small, which 
corresponds to a slow population growth. In the second regime (i.e., Af_2 > 3.25), 
Xt — Xt^i tends to be negative, corresponding to a decrease in population size. 
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Fig 1. Time series plot o/ logj^Q of the Canadian lynx series. 



Tong (_44], p. 377) interprets the fitted model as an "energy balance" between the 
population expansion and the population contraction, yielding a stable limit cycle 
with a 9-year period which is in good agreement with the observed asymmetric 
cycles. Motivated by Van der Pol's equation, Haggan and Ozaki [I^ proposed to fit 
another nonlinear time series model, namely, the exponential autoregressive model 



(5.2) 



Xt 



11 

E( 

j=i 



-l{Xt 



which gives a limit cycle of period 9.45 years. Lim [35| compares the prediction per- 
formance of these and other parametric models and concludes that Tong's SETAR 
model ranks the best among them. 

Taking a more nonparametric approach. Fan and Yao [3] use a functional — 
coefficient autoregressive model to fit the observed Xt series and compare its pre- 
diction with that of threshold autoregression. Specifically, they fit the FAR(2,2) 
model 



(5.3) 



Xt = ai{Xt-2)Xt-i + a2{Xt-2)Xt-2 + ere* 



to the first 102 observations, reserving the last 12 observations to evaluate the 
prediction. The ai(-) and a2(') in (5-3) are unknown functions which are estimated 
by using locally hnear smoothers. Fan and Yao ([l3], P- 327) plot the estimates ai(-) 
and a2{-), which are approximately constant for Xt-2 < 2.7 with ai{Xt-2) ~ 1-3 
and a2{Xt-2) ~ —0.2, and which are approximately linear for Xt-2 > 2.7. For 
comparison. Fan and Yao lj| also fit the following SETAR(2) model to the same 
set of data: 



(5.4) 



Xt = 



0.424 + 1.255Xf_i - 0.348Xf_2, Xt-2 < 2.981, 
1.882 -I- 1.516Xf_i - 1.126Xf_2, Xt-2 > 2.981. 
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Because of the close resemblance of the fitted SETAR(2) and FAR(2,2), they share 
certain ecological interpretations. In particular, the difference of the fitted coeffi- 
cients in each regime can be explained by using the phase dependence and the den- 
sity dependence in the predator-prey structure. The phase dependence refers to the 
difference in the behavior of preys (snowshoe hare) and predators (lynx) in hunting 
and escaping at the decreasing and increasing phases of population dynamics, while 
the density dependence is the relationship between the reproduction rates of the 
animals and their abundance. More discussion on these ecological interpretations 
can be found in [i^l . 

To evaluate the predictions of FAR (2,2), Fan and Yao ([l3|, p. 324) use the 
one-step ahead forecast (denoted by Xt) and the iterative two-step-ahead forecast 
(denoted by Xt), which are defined by 

Xt '■— ai{Xt^2)Xt-i + a2{Xt-2)Xt^2, Xt di{Xt-2)Xt-i + a2{Xt-2)Xt-2- 

The predictions of SETAR(2) are similarly defined. The out-sample prediction ab- 
solute errors {\Xt ~ Xt\ and \Xt — Xt\) of the last 12 observations are reported 
in Table 1. Based on the average of these 12 absolute prediction errors (AAPE), 
FAR(2,2) performs slightly better than SETAR(2). Other nonparametric time se- 
ries models for the Canadian lynx data include the projection pursuit regression 
(PPR) model fitted by Lin and Pourahmadi [si^] who found that SETAR outper- 
forms PPR in terms of one-step-ahead forecasts, and neural network models which 
Kajitani, Hipel and McLeod |25:] found to be "just as good or better than SETAR 
models for one-step out-of-sample forecasting of the lynx data." 

A substantive approach is adopted by Royama ([ilj. Chapter 5). Instead of 
building the statistical model first and using ecology to interpret the fitted model 
later, Royama starts with ecological mechanisms and population dynamics. Letting 
Rt = Xt+i — Xt denote the log reproductive rate from year i to t -I- 1, he consid- 
ers nonlinear dynamics of the form Rt — f{Xt, . . . ,Xt~h+i) + "t, where ut is a 
zero-mean random disturbance, and emphasizes that "our ultimate goal is to deter- 
mine the reproduction surface / and to find an appropriate model which reasonably 
approximates to it," with / satisfying the following two conditions in view of eco- 
logical considerations: There exists X* such that f{X*, . . . ,X*) = 0, and Rt has 
to be bounded above because "no animal can produce infinite number of offspring" 

Table 1 

Absolute prediction errors of one- step- ahead (1 yr) and iterative two- step- ahead (2 yr) forecasts 

and their 12-year average (AAPE). 







Model 


115.311 


Model 




Model 




Model l|5.8a( 






FAR(2,2) 


SETAR(2) 


Logistic 


Logistic-MARS 


Year 


Xt 


1 yr 


2 yr 


1 yr 


2 yr 


1 yr 


2 yr 


1 yr 


2 yr 


1923 


3.054 


0.157 


0.156 


0.187 


0.090 


0.178 


0.075 


0.188 


0.082 


1924 


3.386 


0.012 


0.227 


0.035 


0.269 


0.077 


0.281 


0.057 


0.286 


1925 


3.553 


0.021 


0.035 


0.014 


0.038 


0.057 


0.153 


0.073 


0.120 


1926 


3.468 


0.008 


0.037 


0.022 


0.000 


0.012 


0.077 


0.023 


0.140 


1927 


3.187 


0.085 


0.101 


0.059 


0.092 


0.020 


0.018 


0.122 


0.168 


1928 


2.723 


0.055 


0.086 


0.075 


0.015 


0.128 


0.098 


0.002 


0.159 


1929 


2.686 


0.135 


0.061 


0.273 


0.160 


0.179 


0.004 


0.009 


0.012 


1930 


2.821 


0.016 


0.150 


0.026 


0.316 


0.004 


0.216 


0.010 


0.001 


1931 


3.000 


0.017 


0.037 


0.030 


0.062 


0.005 


0.010 


0.013 


0.025 


1932 


3.201 


0.007 


0.014 


0.060 


0.043 


0.048 


0.042 


0.021 


0.005 


1933 


3.424 


0.089 


0.098 


0.076 


0.067 


0.124 


0.184 


0.066 


0.091 


1934 


3.531 


0.053 


0.175 


0.072 


0.187 


0.083 


0.245 


0.011 


0.087 


AAPE 




0.055 


0.095 


0.073 


0.112 


0.075 


0.117 


0.050 


0.098 
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(see [41[, p. 50, 154, 178). In Chapter 4 of |42], Royama introduces the (first-order) 
logistic model of f{Xt) = — exp{— oq — aiXt-i} to incorporate competition over 
an available resource. Here r„i is the maximum biologically realizable reproduction 
rate, i.e. Rt < Vm for all t] see |42j. Section 4.2.5. An implicit assumption of the 
model is that the resource being depleted during a time step will be recovered to 
the same level by the onset of the next time step. This assumption can be relaxed 
if a linear combination of Xt-j{j = 1, . . . ,h) with /i > 1 is used in the exponential 
term of /, yielding a higher-order logistic model; see [4l|, p. 153. 

Chapter 5 of Royama [41] examines the autocorrelation function and the partial 
autocorrelation function of the Canadian lynx series and concludes that h should 
be set to 2, which corresponds to the model 



(5.5) 



Xt — Xt-i 



exp{-ao - aiXt-i - a2Xt-2} + ut-i, 



where r^, ao,ai and 02 are unknown parameters that need to be estimated; see 
[4l| . p. 190-191. From the scatterplot of Rt-i = Xt — Xt-i versus Xt^2, Royama 
guesses rm ~ 0.6 and X* w 3. He uses this together with trial and error to obtain 
the estimate (?„, ao, ai, 02) = (0.597, 2.526, 0.838, —1.508), but finds that the asym- 
metric cycle of the fitted model does not match the observed cycle from the data 
well. Moreover, the fitted autocorrelation function decays too fast in comparison 
with the sample autocorrelation function. 

Instead of his ad hoc estimates, we can use nonlinear least squares, initialized at 
his estimates, to estimate the parameters of ()5.5p . yielding 



(5.6) Xt ~ Xt-i = 0.460 - cxp{-3.887 - 0.662Xt_i + 1.663Xt_2} + wt„i, 

which implies that the maximum logarithmic reproduction rate is 0.460, i.e., the 
population can grow at most 10°-^^ — 2.884 times per year. Figure 2, top left 




x(t-1) 



Fig 2. Contour plot of Rt-\ = Xt — 
marked by *. The dotted line is Xt—2 
numbered gives the equilibrium X* . 



Xt^l of the logistic model (5. 6}) . The observations are 
= Xt-i- The intersection of this line and the contour 
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corner, shows a negative contour of the response surface of the fitted model (|5.6[) . 
This imphes that the population size can drop sharply in the region Xt-2 > 3.5 
and Xt^i < 2.5, leading to extinction in the upper left part of this region. Whereas 
()5.6p does not rule out the possibility of Xt diverging to — oo, extinction occurs as 
soon as Xt falls below (or equivalently, the population size lO'^' falls below 1). 

Note that one can also derive bounds on the logarithmic reproduction rates 
from the empirical approach. Figure 3 is the plot of the limit cycle generated by the 
skeleton of the fitted model (|5.4p . The limit cycle is of period 8 years. The maximum 
and the minimum logarithmic reproduction rates, attained at years 1 and 5 in 
Figure 3, are 0.212 and -0.269, respectively. That is, the population grows at most 
2Q0.212 _ 1.629 times per year and diminishes by at most a factor of 10~°'^^®=0.538 
per year. Moreover, the limit cycle of (j5.4p implies an infinite loop of expansion and 
contraction and rules out the possibility of extinction. These are consequences of 
adopting an empirical approach because the data are distributed along the main 
diagonal of Figure 2, but not its top left corner nor its lower right corner. In order to 
deduce the behavior of the reproduction rates in these regions, mechanistic modeling 
is essential. On the other hand, the empirical approach uses the observed data better 
and gives more accurate forecasts. Table 1 compares the prediction performance of 
FAR(2,2) and SETAR(2) with that of the logistic model ^J^. The fitted logistic 
model provides the worst AAPE of one-step-ahead and iterative two-step-ahead 
forecasts. Moreover, instead of characterizing the equilibrium with limit cycles, 
the logistic model only gives two equilibrium points, with one corresponding to 
extinction and the other equal to X* — {ao + log(r,„)}/(ai -I- 02) — 3.107 (the 
intersection of the line Xt-i = Xt^2 and the contour of / = in Figure 2.) 

We next apply the combined substantive-empirical approach of Section 4 to these 
data, using the substantive model (5.5) to provide one of the basis functions in the 
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semiparametric model 

/r Xt - Xt-1 = r,„ - exp{-ao - aiXf_i - a2X4_2} 

where g is an unknown function and is a region containing the observations that 
wiU be specified later. Moreover, the difference equation (|5.7|) has the boundary con- 
straint Xt-i+ rm - exp{-ao - aiXt^i - a2Xt-2} + g{Xt~i, Xt-2)I{{Xt-i, Xt-2) e 
S} > 0. The lynx population becomes extinct as soon as this boundary condition 
is violated. Model (|5.7p can be fitted by using the backfitting algorithm. Specifi- 
cally, model (|5.5p is estimated first and then the residuals are used as the response 
variable in nonparametric regression on the predictor variable (Xt_i,Xt_2). The 
difference between the observed Xt — Xt_i and the fitted g is then used as the 
response variable in (|5.5p . whose parameters can be estimated by nonlinear least 
squares. The algorithm of multivariate adaptive regression splines (MARS) devel- 
oped by Friedman (1991) is used for estimating g for the first step in each iteration 
of the above backfitting procedure (the function "mars" in the package of "mda" 
in R can be used). This kind of iteration sheme has been used in fitting partly lin- 
ear models, where the parametric component is a linear regression model and the 
nonparametric component is often fitted by using kernel regression; see 
The fitted response surface is 

1.319 - exp{-0.224 - 0.205Xt_i + 0.343Xt_2} 
+ giXt-i,Xt_2)I{{Xt-i,Xt-2) eS} + ut-i, 
2.2M{Xt-i - 3.224)+(Xt_2 - 2.864)+ 
- 1.572(Xt_i - 3.202)+ - 0.851(Xt_2 - 3.202)+. 

We evaluate this fitted model by using the out-sample prediction criterion. Table 1 
shows that ()5.8a|) gives the smallest AAPE for one-step-ahead forecasts among all 
models considered, and that the AAPE for iterative two-step-ahead forecasts of 
(j5.8ap is comparable to the smallest one provided by FAR(2,2). The region S in 
(|5.8ap is chosen to be the oblique rectangle whose edges are defined by the sample 
means ±3 standard deviations of the principal components of the bivariate sample 
of (A"t_i, A'f_2); see Figure 4 which shows that this region contains not only the 
in-sample data but also the out-sample data. Figure 5 gives the contour plot of 
the fitted model (|5.8ap . The logarithmic growth rate at its top left corner is about 
—2, which shows a strong possibility of extinction even though the magnitude is 
less drastic than that in Figure 2 for (|5.6p . The inclusion of tensor products of 
univariate splines in (I5.8ap would have produced positive probability limits of Xt 
diverging to 00 or to —00 if {Xt^i,Xt-2) had not been confined to a compact 
region. On the other hand, with an absorbing barrier at and with (j5.8bp only 
applicable inside the compact set S, Markov chains of the type (|5.8ap not only have 
stationary distributions but are also geometrically ergodic under mild assumptions 
on the random disturbances ut (e.g., to ensure irreducibity) ; see [39| . 



(5.8a) 



Xt — Xt-i 



(5.8b) 



}{Xt~i, Xt-2) — 



6. Conclusion 



In his concluding remarks. Cox noted that for successful use of statistical models 
in particular applications, "large elements of subject-matter judgment and technical 
statistical expertise are usually essential. Indeed, it is precisely the need for this 
combination that makes our subject such an interesting and demanding one." We 



206 



T. L. Lai and S. P.-S. Wong 



x(t-1) 

Fig 4. The oblique rectangle S formed by ±3 standard deviations away from the sample means 
of the principal components of (Xt—i,Xt—2)- The in-sample and out-sample observations are 
marked by * and o, respectively. 
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Fig 5. Contour plot of Ht-i = Xt — Xt—i of the logistic-MARS model 
are marked by *. The shaded region corresponds to extinction. 



The observations 
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have followed up on his remarks here with a combined subject-matter and statistical 
modeling approach to time series analysis, which we illustrate for the "particular 
applications" of option pricing and population dynamics of the Canadian lynx. In 
particular, for the Canadian lynx data, we have shown how statistical modeling for 
data-rich regions of (Xt-i, Xt-2) can be combined effectively with "subject-matter 
judgment" which is the only reliable guide for sparse-data regions. 
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